Skip to content

Conversation

@fernantho
Copy link
Contributor

@fernantho fernantho commented Oct 15, 2025

What type of PR is this?
Feature

Which issues(s) does this PR fix?
Partially #15598

What does this PR do? Why is it needed?
This PR develop a way to calculate the Generalized Indices of a given path within a SSZ Object. To do so, it follows Consensus Spec's Merkle proofs.

A conversion from PathElement to Generalized Index is necessary to work with fastssz proofs library.

The code implemented walks the path within the SSZInfo struct in a consensus layer spec way. We have to take into account that the "path" input is slightly different:

def get_generalized_index(typ: SSZType, *path: PyUnion[int, SSZVariableName]) -> GeneralizedIndex:
    """
    Converts a path (eg. `[7, "foo", 3]` for `x[7].foo[3]`, `[12, "bar", "__len__"]` for
    `len(x[12].bar)`) into the generalized index representing its position in the Merkle tree.
    """
// GetGeneralizedIndexFromPath calculates the generalized index for a given path.
// To calculate the generalized index, two inputs are needed:
// 1. The sszInfo of the root info, to be able to navigate the SSZ structure
// 2. The path to the field (e.g., "field_a.field_b[3].field_c")
// It walks the path step by step, updating the generalized index at each step.
func GetGeneralizedIndexFromPath(info *sszInfo, path []PathElement) (uint64, error) {

The pythonic version expects a Path input [FieldA,"FieldB",3] while the Go version expects field_a.field_b[3].

Other notes for review

  • At the beginning, I considered implementing an implementation using recursion but this approach was discarded because of the inputs, as we already have the PathElement array, we can just loop this array.
  • Input format should be snake case as the ssz generated type names are in snake case.
  • Generalized Indices computed in tests were gotten from a Python script that relies on the Consensus Layer spec: https://github.com/fernantho/generalized-indices-ground-truth/blob/main/generalized_indices.ipynb
  • Multidimensional arrays GI will be handled at a later iteration.

Acknowledgements

@fernantho fernantho marked this pull request as ready for review October 17, 2025 12:56
@fernantho fernantho force-pushed the feat/ssz-ql-parse-path-to-generalized-index branch from ce17749 to 73e3ee7 Compare October 21, 2025 08:02
fernantho and others added 4 commits October 21, 2025 12:27
…e with length >= 1

If s does not contain sep and sep is not empty, Split returns a slice of
length 1 whose only element is s.
Copy link
Contributor

@syjn99 syjn99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good, my unofficial approval 👍

- renamed itemLengthFromInfo to itemLength (same name is in spec).
- arranged all SSZ helpers.
}

// Starting from the root generalized index
root := uint64(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename it to something like currentIndex? It's a bit odd to call it root since it's not a root of anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!
I got the inspiration for the root name from spec:

def get_generalized_index(typ: SSZType, *path: PyUnion[int, SSZVariableName]) -> GeneralizedIndex:
    """
    Converts a path (eg. `[7, "foo", 3]` for `x[7].foo[3]`, `[12, "bar", "__len__"]` for
    `len(x[12].bar)`) into the generalized index representing its position in the Merkle tree.
    """
    root = GeneralizedIndex(1)
   (...)

But I do not like it because I do not associate it to an index.

// e.g. "array[0][1]" -> []uint64{0, 1}. Errors if none are found or if any index is invalid.
func extractArrayIndices(name string) ([]uint64, error) {
// Match all bracketed content, then we'll parse as unsigned to catch negatives explicitly
re := arrayIndexRegex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of assigning to a new variable instead of using arrayIndexRegex directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No purpose at all. I'll remove it.

Comment on lines 103 to 109
if strings.HasPrefix(raw, "-") {
return nil, fmt.Errorf("cannot process negative indices %q", raw)
}
idx, err := strconv.ParseUint(raw, 10, 64)
if err != nil {
return nil, fmt.Errorf("invalid array index: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the documentation for ParseUint it seems that it doesn't allow negative numbers, so the HasPrefix check is unnecessary

// ParseUint is like [ParseInt] but for unsigned numbers.
//
// A sign prefix is not permitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Totally unnecessary check.

if len(parts) != 2 {
return nil, fmt.Errorf("invalid index notation in path element %s", elem)
}
re := regexp.MustCompile(`^\s*len\s*\(\s*([^)]+?)\s*\)\s*$`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about extracting this regex to a package-level variable, just like you did with arrayIndexRegex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to extract this one. It's now at the package-level variable.

}
re := regexp.MustCompile(`^\s*len\s*\(\s*([^)]+?)\s*\)\s*$`)
matches := re.FindStringSubmatch(processingField)
if len(matches) == 2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add an explanation why 2 is expected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added explanation. I'm now considering regular expressions an overkill, when I added them I was considering input validation and correction.

},
wantErr: false,
},
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add even more test cases? The ones I can think of:

  • leading double dot --> error
  • trailing dot --> error
  • len(data) --> error
  • len(data.target.root) --> ok
  • len(data.target.root).foo --> error
  • data.target.len(root) --> error

The easiest way to get a bunch of test cases it to pass the regex to an AI and ask it to generate them. I expect it will overdo this, but some cases can be useful.

Copy link
Contributor Author

@fernantho fernantho Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding them, but I see we do not follow the same convention for requests:

  • len(data.target.root) --> error
  • data.target.len(root) --> ok

As we firstly split the raw input by ., having this input len(data.target.root) would result in:

  • len(data
  • target
  • root)

leading to a wrong outcome.

On the contrary, this would succeed data.target.len(root)

  • data
  • target
  • len(root)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image image

We must properly specify the input format for these queries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • len(data) --> error

I do not see an error here neither. Imagine we are querying validators field length in beacon state.
In this case the query would contain len(validators)

// Helpers for Generalized Index calculation per type

// calculateLengthGeneralizedIndex calculates the generalized index for a length field.
// note: length fields are only valid for List and Bitlist types. Multi-dimensional arrays are not supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be supported for Vector and Bitvector too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In relation to this, I also followed the spec algo:

    for p in path:
        # If we descend to a basic type, the path cannot continue further
        assert not issubclass(typ, BasicValue)
        if p == "__len__":
+            assert issubclass(typ, (List, ByteList))
            typ = uint64
            root = GeneralizedIndex(root * 2 + 1)

To my understanding, there is no length field for Vector and Bitvector as they have fixed size determined by their type.


// calculateLengthGeneralizedIndex calculates the generalized index for a length field.
// note: length fields are only valid for List and Bitlist types. Multi-dimensional arrays are not supported.
func calculateLengthGeneralizedIndex(fieldSsz *SszInfo, element PathElement, root uint64) (*SszInfo, uint64, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to the comment above (changing root to currentIndex), can you rename the root param to something like parentIndex (or something else more suitable)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-renamed them 😅
changed all of them from root to currentIndex. But now I got to this comment and, for these functions, I like this parentIndex more than currentIndex.

rkapka
rkapka previously approved these changes Oct 27, 2025
@rkapka rkapka enabled auto-merge October 27, 2025 17:02
auto-merge was automatically disabled October 27, 2025 17:10

Head branch was pushed to by a user without write access

@rkapka rkapka enabled auto-merge October 27, 2025 17:57
rkapka
rkapka previously approved these changes Oct 27, 2025
auto-merge was automatically disabled October 27, 2025 22:16

Head branch was pushed to by a user without write access

@rkapka rkapka enabled auto-merge October 27, 2025 23:07
@rkapka rkapka added this pull request to the merge queue Oct 27, 2025
Merged via the queue into OffchainLabs:develop with commit 10a2f06 Oct 27, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants